Back

BMC Medical Genomics

Springer Science and Business Media LLC

Preprints posted in the last 30 days, ranked by how well they match BMC Medical Genomics's content profile, based on 36 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.

1
Integrative multi-cohort analysis reveals consistent sex differences in gut microbiota of multiple sclerosis patients

Soler-Saez, I.; Galiana-Rosello, C.; Grillo-Risco, R.; Falony, G.; Tepav?evi?, V.; Vieira Silva, S.; Garcia-Garcia, F.

2026-04-22 neuroscience 10.64898/2026.04.17.719247 medRxiv
Top 0.1%
3.6%
Show abstract

Biological sex is a key determinant in the onset and progression of multiple diseases. In multiple sclerosis (MS), females exhibit higher disease prevalence, earlier onset, and more pronounced inflammatory activity, whereas males tend to experience a more severe neurodegenerative course, characterized by accelerated central nervous system damage and increased brain atrophy. The gut microbiome has emerged as a critical factor in MS, as its composition can either ameliorate or exacerbate disease progression. In this study, we aimed to identify reproducible sex-associated differences in gut microbial composition across independent cohorts of MS patients. Through a systematic search we identified six independent studies based on 16S rRNA gene sequencing, comprising a total of 337 samples. Despite substantial inter-study variability, sex-associated differences were more pronounced in MS patients than in healthy controls. We identified 11 microbial taxa showing significant sex-associated differences in MS, nine enriched in females and two in males. Notably, the female-enriched taxa Eggerthella and Eisenbergiella were associated with specific MS subtypes and higher disability. To facilitate the use of our findings by the scientific community, we developed a freely accessible web-based tool that provides full access to our results. Thus, in this work we identified consistent and reproducible sex differences in the gut microbiota of MS patients, highlighting the importance of incorporating sex as a critical variable in microbiome research, with potential implications for understanding disease heterogeneity in MS. IMPORTANCEMultiple sclerosis (MS) affects females and males differently, but the biological reasons behind these differences are not fully understood. One potential factor is the gut microbiome (i.e., the community of microorganisms living in our intestines) which can influence immune function and disease progression. In this study, we analyzed data from multiple independent cohorts and found consistent differences in gut microbial composition between female and male MS patients. Notably, certain bacteria were more abundant in females and were linked to more severe disease features. We also developed a freely accessible web tool where researchers can explore the complete findings in detail. Our results highlight the importance of considering sex as a key factor in microbiome research and may help guide more personalized approaches to understanding and treating MS.

2
Proteomic Insights into Lp(a) Cardiovascular Mechanisms: A Mendelian Randomization Study

Tomasi, J.; Xu, H.; Zhang, L.; Carey, C. E.; Schoenberger, M.; Yates, D. P.; Casas, J.

2026-04-22 genetic and genomic medicine 10.64898/2026.04.20.26351299 medRxiv
Top 0.3%
2.1%
Show abstract

Background: Elevated lipoprotein(a) [Lp(a)] is a known risk factor for several cardiovascular-related diseases established from multiple genetic and observational studies. However, the underlying mechanisms mediating the effects of Lp(a) levels on cardiovascular disease risk and major adverse cardiovascular events (MACE) are unclear. The aim of this study was to identify proteins downstream of Lp(a) using mendelian randomization (MR) - a genetic causal inference approach. Methods: A two-sample MR was performed by initially identifying Lp(a) genetic instruments based on data from genome wide association studies (GWAS) of Lp(a) blood concentrations. These instruments were then tested for association with proteins from proteomic pQTL data (Olink from UK Biobank, 2940 proteins and SomaScan from deCODE, 4907 proteins). Results: A total of 521 proteins associated with Lp(a) were identified. Using pathway enrichment analysis, the following MACE-relevant pathways were identified comprising a total of 91 Lp(a) downstream proteins: oxidized phospholipid-related, chemotaxis of immune cells and endothelial cell activation, pro-inflammatory monocyte activation, neutrophil activity, coagulation, and lipid metabolism. Conclusion: The results suggest that the influence of Lp(a) treatments is primarily through modifying inflammation rather than lipid-lowering, thus providing insight into the mechanistic framework which mediates the effects of elevated Lp(a) on atherosclerotic cardiovascular disease.

3
The Power of Partnership: Democratizing Genetic Prevalence to Empower Patient Advocacy

Baxter, S. M.; Singer-Berk, M.; Glaze, C.; Russell, K.; Grant, R. H.; Groopman, E.; Lee, J.; Watts, N.; Wood, J. C.; Wilson, M.; Rare As One Network, ; Rehm, H. L.; O'Donnell-Luria, A.

2026-03-31 genetic and genomic medicine 10.64898/2026.03.30.26349539 medRxiv
Top 0.3%
2.0%
Show abstract

Introduction: Accurate estimation of disease prevalence is crucial for public health and therapeutic development, but traditional methods are often inaccurate. Genetic prevalence, which estimates the proportion of a population with a causal genotype, using allele frequencies from population data, offers an important alternative. Methods: We partnered with 18 Rare As One patient organizations to estimate genetic prevalence for 22 autosomal recessive conditions using population data from two releases of the Genome Aggregation Database (gnomAD). To standardize and democratize these analyses, we developed the Genetic Prevalence Estimator (GeniE), a publicly available tool, for accessible calculations. Results: Conservative carrier frequencies in gnomAD v4.1 ranged from 1/164 to 1/11,888. The median change in genetic prevalence frequency between v2.1 to v4.1 was 0.806. Partnership with patient advocacy groups provided critical real-world context that refined the interpretation of these estimates. Discussion: These findings highlight that genetic prevalence is not a static figure but a dynamic, evolving measure with important caveats that need to be considered. Our study underscores the necessity of re-evaluations as databases expand. By integrating patient-partnered insights with the GeniE platform, we empower the genomics community to maintain transparent, up-to-date, and actionable data for rare disease advocacy and drug development.

4
Obesity-related alterations in plasma metabolomics and fecal microbiota in Down syndrome Dp(16)1Yey mice

Halder, P.; Selloum, M.; Ichou, F.; Lindner, L.; Desnouveaux, L.; Lejeune, F.-X.; Pavlovic, G.; Herault, Y.; Potier, M.-C.

2026-04-16 neuroscience 10.64898/2026.04.10.717726 medRxiv
Top 0.4%
1.7%
Show abstract

Background/ObjectivesIndividuals with Down syndrome (DS) are at increased risk of obesity and metabolic comorbidities, yet the mechanisms underlying these conditions remain unclear. Here we investigated how DS-associated genetic condition interacts with diet and metabolic pathways in the Dp(16)1Yey mouse model of DS. MethodsUntargeted plasma metabolomics was performed in Dp(16)1Yey and control mice, subjected to either control or high-fat diet (HFD). Raw data were processed, and features were annotated. Statistical analyses were conducted in R, and pathway analysis was performed with MetaboAnalyst v5.0. Fecal microbiome was obtained using 16SrRNAseq and analyzed using phyloseq in R. ResultsDiet exerted the strongest effect on mice plasma metabolome, followed by sex and genotype. Seventy-five diet-responsive metabolites were enriched in amino acid and nucleotide metabolism. Genotype-driven changes affected 34 metabolites, notably impacting amino acid and taurine-hypotaurine metabolism. Fifty-six sex-associated metabolites highlighted disruptions in aromatic amino acid biosynthesis and pyrimidine metabolism. A significant Diet*Genotype interaction was observed for five metabolites, including a marked reduction in the microbiota-derived metabolite 3-indolepropionic acid (IPA) in Dp(16)1Yey mice on HFD. Both genotype and diet exerted pronounced effects on fecal microbiome with selective depletion of the IPA-producing Clostridia in Dp1Yey mice under HFD. ConclusionSegmental trisomy in Dp(16)1Yey mice modulates the host metabolic response to dietary fat, partly through microbiota-derived metabolites such as IPA. These findings highlight the importance of genotype, diet, and microbiome interactions in shaping metabolic disease risk in DS and point toward microbiota-targeted dietary interventions.

5
Identification and Analysis of Novel RNA Editing Sites in Neurodegenerative Diseases Using Machine Learning Approaches.

Jabin, S.; Natarajan, E.

2026-04-13 neuroscience 10.64898/2026.04.09.716726 medRxiv
Top 0.5%
1.7%
Show abstract

BackgroundRNA editing is a post-transcriptional modification that alters the sequence of an RNA transcript. Two types of RNA editing were found in mammals, involving the enzymatic deamination of either adenosine to inosine (A-to-I) or cytidine to uridine (C-to-U) nucleotides in RNA. A-to-I, which is the most common form of RNA editing, is mediated by the ADAR (adenosine deaminases acting on RNA) family of enzymes, ADAR1, ADAR2, and ADAR3. The editing event alters the hydrogen bond pairing of nucleobases, and the editing site will be recorded as guanosine rather than the original adenosine. Indeed, RNA editing deregulation has been linked to several nervous and neurodegenerative diseases. In this project work is done on Alzheimers disease (AD) and the samples are from anterior cingulate cortex of human brain tissue. AD is the main dementia in the world and a neurodegenerative condition prevalent in the elderly. MethodologyA total of 20 raw RNA-sequencing data samples containing 10 controls and 10 Alzheimers disease (AD) cases were collected from NCBI using SRA Toolkit. Quality assessment was performed using FastQC and processed using Trimmomatic. Alignment was done using STAR RNA-seq aligner. RNA editing detection was performed using REDItools, detected sites were subsequently annotated against the REDIportal database. The resulting control-specific and disease-specific novel editing sites were merged into a single dataset containing exclusively novel, group-specific A-to-I editing events. This merged dataset was subsequently used for downstream feature extraction and machine learning analysis. Probability-based filtering was done to extract high-confidence disease associated sites and their gene list was used for computational level biological validation, pathway and functional enrichment analysis as well as overlap with known AD loci. ResultsRandom Forest showed the highest accuracy score (0.804) and ROC-AUC score (0.854). Most important features that differentiated control and diseased novel sites in random forest were coverage ([~]0.35), editing level ([~]0.33) and GC content ([~]0.15). The AEI mean values is higher in both male and female diseased cases ([~]0.48-0.50) but less in male and female control cases ([~]0.14-0.21). The mean values of ADAR1_CPM higher in control cases (123.65-143.30) and is less in diseased cases (88.35-97.93), ADAR2_CPM is almost equal in all cases ([~]3.7-4.7) and ADAR3_CPM is very less in all the cases ([~]0-0.02). Most candidate editing site were present in exon ([~]62-67 %) CDS regions ([~]17-21%) and relatively smaller fraction of gene ([~]15-16 %). Editing alterations preferentially affect molecular systems governing synaptic structure, neurotransmission, and central nervous system integrity. In the main set -of the 2576 high-confidence genes identified, 33 overlapped with AD GWAS loci. In the core set -of the 1367 high-confidence genes identified, 11 overlapped with AD GWAS loci. ConclusionFeature like coverage, editing level and GC content contributed most. Alu sites are negligible as compared to non-alu sites but the AEI mean values are higher in diseased cases than in control cases. The mean values of ADAR1_CPM are higher than ADAR2_CPM and ADAR3_CPM.Sex does not play a major factor. High-confidence disease-associated RNA editing sites are strongly biased toward transcript-centric regions, particularly exons, with a notable subset affecting coding sequences. Importantly, enrichment of neurodegeneration-associated pathways and cognition-related human phenotypes further supports the disease relevance of these gene networks. RNA editing events in Alzheimers cortex may represent a regulatory mechanism largely independent of inherited genetic susceptibility loci.

6
Proteogenomic analysis of 5,411 plasma proteins in sickle cell disease patients

Groza, C.; Chignon, A.; Lo, K. S.; Bellegarde, V.; Bartolucci, P.; Lettre, G.

2026-04-07 genetic and genomic medicine 10.64898/2026.04.06.26350255 medRxiv
Top 0.5%
1.7%
Show abstract

There are few therapeutic options to treat patients with sickle cell disease (SCD), a blood disorder caused by mutations in the {beta}-globin gene that affects >7M individuals worldwide. Combining human genetics and high-throughput proteomics can help identify new drug targets. Here, we present results from a proteogenomic analysis of the plasma proteome in SCD patients. We measured the levels of 5,411 plasma proteins and tested their associations with common genetic variation in 343 SCD patients. After conditional analyses, we identified 560 protein quantitative trait loci (pQTL), including 58 (10%) that are novel. Many of these pQTL are not specific to SCD patients and associate with clinically relevant traits in non-SCD African Americans from the Million Veteran Program (e.g. hemoglobin concentration, triglycerides). The effect sizes of the pQTL is largely concordant between SCD and non-SCD individuals, although we found examples (e.g. APOL1, haptoglobin) with evidence of heterogeneity that suggests an interaction between the plasma proteome and the SCD genotype. Finally, we combine pQTL and genome-wide association study results for fetal hemoglobin (HbF) in a Mendelian randomization analysis to prioritize five proteins that may increase HbF production (ENPP5, LBP, NAAA, PT3X, ZP3).

7
The Effect of Vitamin-D Supplementation on HDAC2 Levels in Stable COPD Patients

Donastin, A.; Irawan, D.; Effendy, E.; Iryawan, R. D. A.; Nuari, N.; Oktaviana, B. M.; Yahya, D.; Muhammad, A. R.

2026-04-08 respiratory medicine 10.64898/2026.04.05.26348641 medRxiv
Top 0.5%
1.7%
Show abstract

Background: Chronic Obstructive Pulmonary Disease (COPD) is the third leading cause of global mortality, with persistent lung inflammation contributing to disease progression. This inflammation is partly associated with reduced levels of histone deacetylase 2 (HDAC2). Previous studies suggest that Vitamin D may modulate HDAC2 levels. This study aimed to evaluate the effect of Vitamin D supplementation on HDAC2 expression in stable COPD patients. This experimental study aimed to evaluate the effect of vitamin D supplementation on HDAC2 expression in stable COPD patients at Jemursari Islamic Hospital. Methods: Five COPD patients received a daily dose of 5000 IU of Vitamin D for three months. Serum levels of 25(OH)D3 and HDAC2 were measured before and after the intervention. Results: Vitamin D supplementation resulted in a significant increase in both 25(OH)D and HDAC2 levels. Pulmonary function parameters showed an increasing trend, however, no statistically significant differences were observed. Conclusion: Vitamin D supplementation was associated with increased HDAC2 levels, suggesting a potential anti-inflammatory effect. However, no significant improvement in pulmonary function was observed. Further studies are needed to determine its clinical impact.

8
Evaluation of somatic variant calling methods on high coverage tumour-only amplicon sequencing data in a clinical environment

Bharne, D.; Gaston, D.

2026-04-11 bioinformatics 10.64898/2026.04.08.717310 medRxiv
Top 0.6%
1.4%
Show abstract

One of the current workhorses of next-generation sequencing in clinical molecular diagnostics laboratories for profiling somatic mutations in tumours are amplicon-based targeted sequencing panels. Many open-source somatic variant callers are available; however, their use in clinical applications remains under explored. Therefore, we integrated outputs of six variant callers (FreeBayes, MuTect2, Pisces, Platypus, VarDict and VarScan) into a Snakemake pipeline and evaluated tumour-only data from the HD789 commercial reference standard sequenced in triplicate on three different sequencing runs using the Illumina AmpliSeq Focus panel on MiSeq and NextSeq 2000. A 1:4 dilution sample was sequenced for evaluating limits of variant detection. The called variants were analysed along depth, allele frequency, and other sequencing metrics. The variant callers were evaluated by their level of concordance and performance on known somatic variants. FreeBayes consistently called the largest number of somatic variants in each sample but also included more potential artifacts. Overall, FreeBayes, VarScan, MuTect2, and Pisces had the best performance on HD789 data.

9
Germline VCF Annotator: a lightweight pipeline for processing germline VCFs with robust variant extraction and read evidence quality control

Manojlovic, Z.

2026-04-09 bioinformatics 10.64898/2026.04.06.716730 medRxiv
Top 0.6%
1.4%
Show abstract

Raw variant calls are typically distributed as VCF files and are not well-suited for direct human review. They are intended for programmatic parsing, and spreadsheet import can distort data through automatic type conversion. Furthermore, variants in VCF are commonly annotated to add gene context and predicted functional consequences. Ensembl VEP, a widely used standard for transcript-aware variant annotation, was adapted in this study to generate standardized consequence fields across genomic features. Using a colon crypt whole-genome sequencing cohort as the motivating dataset, this study examined whether variation at DNA damage response and repair (DDR) loci could contribute to mutation-burden patterns in normal colon crypts, including patterns associated with age and potential treatment-related exposure. To make this question testable in a reproducible table-based format, the Germline VCF Annotator was developed as a two-step workflow that normalizes germline VCFs, generates VEP tabular annotations with explicit allele fields, and then extracts variants of interest and appends read-evidence metrics to assign a rules-based QC class. Within-patient concordance across technical repeats at predefined DDR loci was near-perfect after filtering for nonsilent SNVs with read depth [≥]15, with discordance concentrated among Low-QC loci. Bulk and crypt-derived samples showed no age-related trend in DDR burden. Although the demonstration centers on DDR and aging, the Germline VCF Annotator is applicable to other gene sets that require human-readable locus-level summaries with retained allele provenance and read evidence.

10
Translation, Validation, and Application of Indonesian Genetic Literacy Questionnaires for Medical Students

Kemal, R. A.; Dhani, R.; Simanjuntak, A. M.; Rafles, A. I.; Triani, H. X.; Rahmi, T. M.; Akbar, V. A.; Firdaus, F.; Pratama, B. F.; Zulharman, Z.

2026-04-25 medical education 10.64898/2026.04.17.26350524 medRxiv
Top 0.6%
1.3%
Show abstract

Background: Increasing relevance of genetics and molecular biology in medicine necessitates greater genetic literacy among healthcare workers. To assess the literacy level, a validated genetic literacy questionnaire is needed. Therefore, a standardised Indonesian-language genetic literacy questionnaire is essential. Aims: We aimed to translate and validate three genetic literacy questionnaires (PUGGS, iGLAS, and UNC-GKS) for use among Indonesian medical students. We then evaluated genetic literacy levels using one of the validated questionnaires. Methods: The PUGGS, iGLAS, and UNC-GKS questionnaires were translated into Indonesian and then reviewed by an expert panel for translational accuracy and conceptual appropriateness. Back-translation was performed to confirm validity. Initial Indonesian versions of the questionnaires underwent cognitive pre-testing with 12 undergraduate medical students. After refinements, the questionnaires were validated among 34 first- to third-year medical students. The Indonesian version of UNC-GKS questionnaire was then used to assess genetic literacy of 486 medical students comprising 228 preclinical medical students, 187 clerkships, and 71 residents. Results: The Indonesian versions of PUGGS (Cronbach's = 0.819) and UNC-GKS ( = 0.809) demonstrated good reliability, while iGLAS showed poor reliability ( = 0.315). Among the 486 students tested, 56% demonstrated moderate overall genetic literacy, and only 15.2% demonstrated good overall literacy. Basic genetic concepts were relatively well-understood with 54.3% having good literacy. On the contrary, gene variant's effects on health were poorly understood with only 9.7% having good literacy. Inheritance concepts were moderately understood with 24.9% having good literacy. Conclusion: The Indonesian translations of PUGGS and UNC-GKS are reliable tools for assessing genetic literacy among medical students. Using UNC-GKS, we observed predominantly moderate genetic literacy levels. Curriculum improvement to better integrate genetics education is essential to support its clinical applications.

11
Using Patient iPSC-derived Retinal Pigment Epithelial Cells to Evaluate Differential Susceptibility to MEK Inhibitor-Associated Retinopathy

Lozano, L. P.; Boyce, T. M.; Groves, A. P.; Keen, H. L.; Boldt, H. C.; Mullins, R. F.; Binkley, E. M.; Tucker, B. A.

2026-04-14 pharmacology and toxicology 10.64898/2026.04.11.717944 medRxiv
Top 0.6%
1.3%
Show abstract

PurposeCompare the effect of MEK inhibition on iPSC-derived retinal pigmental epithelial (RPE) cells generated from a patient who developed MEK inhibitor-Associated Retinopathy (MEKAR) versus a patient who did not develop retinopathy. DesignCase-control SubjectsTwo female patients with Neurofibromatosis Type 1 who were treated with MEK inhibitors. One patient developed MEKAR, the other did not. MethodsRPE were generated from human induced pluripotent stem cells (hiPSCs) from these two patients. These hiPSC-derived RPE were treated with selumetinib for 10 days. Main Outcome MeasuresPhagocytic activity and changes in gene expression ResultsAs previously reported, there was a significant increase in internalized rhodopsin in phagocytosis assays, yet this was only found in hiPSC-derived RPE from the patient who developed MEKAR. Selumetinib decreased expression of genes related to fluid transport and cell volume, including aquaporins and solute transporters. At baseline, cells from the patients without MEKAR had higher expression of these genes. Interestingly, selumetinib-induced changes in gene expression only reached statistical significance in cells from the patient who did not develop MEKAR, suggesting these changes may be a compensatory protective mechanism. Patients susceptible to forming MEKAR may have increased phagocytosis without a compensatory change in expression of genes related to fluid flux, thereby inhibiting their ability to transport fluid out of the subretinal space. ConclusionsMEK inhibitor-Associated Retinopathy may only affect susceptible patients whose retinal pigment epithelium cannot sufficiently regulate expression of genes related to fluid transport and cell volume, altering the ability of these cells to properly function.

12
An atlas of transcriptional dynamics in maternal blood over the course of healthy pregnancy

Feenstra, B.; Hede, F. R. D.; Piening, B. D.; Skotte, L.; Nastou, K.; Liang, L.; Fadista, J.; Rasmussen, M.-L. H.; Scheller, N. M.; Jiang, C.; Vallania, F.; Wei, E.; Liu, Q.; Chaib, H.; Geller, F.; Boyd, H. A.; Snyder, M. P.; Melbye, M.

2026-04-01 genomics 10.64898/2026.03.30.715300 medRxiv
Top 0.7%
1.3%
Show abstract

Pregnancy results in profound physiological changes driven by dynamic and precisely programmed molecular processes. Maternal peripheral blood is generally the specimen of choice for studying these processes, as it is easily accessible and essential for many aspects of maintaining a healthy pregnancy. Here, we present a high-resolution atlas of the dynamic temporal changes in the transcriptome of maternal peripheral blood in healthy human pregnancy. We generated comprehensive RNA sequencing data in 802 weekly samples from 31 healthy pregnant women from the first trimester until after delivery. Using a strict discovery and replication setup, our longitudinal analysis of gene expression identified 720 genes with robust pregnancy-specific expression patterns. Using weighted graph correlation network analysis, we identified nine pregnancy-associated transcriptional modules that reveal a strong, coordinated enrichment of innate/neutrophil and antiviral immune programs, alongside changes in adaptive immunity (T cell differentiation and signaling), erythropoiesis and hemoglobin metabolism. Cell-type deconvolution revealed that these transcriptomic shifts were accompanied by increased relative neutrophil proportions and reduced naive CD4 and CD8 T cells in pregnancy. We provide a comprehensive characterization of dynamic changes across pregnancy, highlighting maternal blood as a key systemic regulator in healthy gestation. Together, our findings establish a reference atlas of healthy pregnancy, which can be used to identify dysregulated processes and mechanisms in women with pregnancy complications. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=168 SRC="FIGDIR/small/715300v1_ufig1.gif" ALT="Figure 1"> View larger version (34K): org.highwire.dtl.DTLVardef@2a4b28org.highwire.dtl.DTLVardef@ac49d9org.highwire.dtl.DTLVardef@12468c8org.highwire.dtl.DTLVardef@15b282f_HPS_FORMAT_FIGEXP M_FIG C_FIG O_LI720 genes showed robust pregnancy specific expression patterns. C_LIO_LICo-expression analysis clustered the genes into nine modules with distinct dynamics. C_LIO_LIEnrichment in pathways involved in innate and neutrophil-mediated immunity, antiviral responses, T cell differentiation and signaling, erythropoiesis and hemoglobin metabolism. C_LIO_LICell-type deconvolution showed increases in neutrophils and decreases in naive CD4 and CD8 T cells. C_LIO_LIThe atlas of detailed longitudinal transcriptional changes provides a baseline reference for healthy pregnancy. C_LIO_LIResults for all genes and protein-protein interaction networks are made available for interactive exploration. C_LI

13
Genetic loss of JAK1 and cutaneous HPV infection

Fan, S.-Q.; Wang, R.-R.; Colombo, R.; Tang, K.-C.; Liu, J.-W.; Pontoglio, A.; Zhang, L.-L.; Li, K.; Han, S.-R.; Zhang, H.; Bai, X.; Yu, X.; Habulieti, X.; Liu, K.-Q.; Sun, Y.; Sun, L.-W.; Liu, H.; Sun, M.; Lin, Z.-M.; Zhang, F.-R.; Ma, D.-L.; Zhang, X.

2026-04-08 genetic and genomic medicine 10.64898/2026.04.03.26350014 medRxiv
Top 0.7%
1.2%
Show abstract

Background: Human papillomaviruses (HPVs) pose a severe threat to global public health by driving nonmelanoma skin cancer (NMSC) and cervical cancer, with NMSC being one of the most common cancers worldwide. Epidermodysplasia verruciformis (EV) is an inborn error of immunity characterized by an increased susceptibility to persistent infection of cutaneous HPV and a high risk of NMSC. The genetic basis remains unknown in many patients with EV. Methods: We collected four unrelated pedigrees with EV. Genetic analysis identified five variants in JAK1 encoding the Janus kinase 1. Ex vivo models and patient-derived tissue were employed to evaluate the functional effects of JAK1 variants and delineate the pathogenic mechanisms. Results: We identified different variants in JAK1 in four pedigrees with dominant EV. Genetic analysis revealed five novel variants in JAK1, three of which resulted in nonsense-mediated mRNA decay (NMD). Functional assays identified a decreased phosphorylation of the signal transducers and activators of transcription (STATs), impaired interferon responses, and defective T cell activation. Immune dysregulation in patients, characterized by a reduced CD4/CD8 T cell ratio, decreased CD8 naive T cell proportion, and accumulated memory T cells, implies impaired antiviral immunity against HPV. Conclusions: Our findings confirm that JAK1 loss-of-function (LOF) variants underlie susceptibility to cutaneous HPV infection. [Funded by the National Natural Science Foundation of China (81788101, 81230015, 82394420, and 82394423), the National Key Research and Development Program of China (2022YFC2703900), the CAMS Innovation Fund for Medical Sciences (2021-I2M-1-018), and the Regione Lombardia, Italy (Innovative Research Project 1137-2010)].

14
IEKB: a comprehensive knowledge base for inner ear genetics integrating curated associations, cochlear interactions, Bayesian candidate prioritisation, explainable dark-gene support relations, and a scientific entity network

Wang, H.; Chen, W.; Ning, H.; Cai, Y.; Xu, Y.; Hou, X.; Pang, L.; Luo, Z.; Tian, C.

2026-04-09 bioinformatics 10.64898/2026.04.06.716823 medRxiv
Top 0.7%
1.2%
Show abstract

Inner-ear genetics has expanded rapidly, yet the supporting evidence remains dispersed across a vast literature and across resources that typically emphasise loci, variants, or expression data rather than integrated biological interpretation. Here we present the Inner Ear Knowledge Base (IEKB; https://earkb.org), an open database that unifies curated associations, cochlear interaction evidence, candidate prioritisation, explainable support relations, and network exploration for inner-ear research. IEKB was built with an automated agent-assisted curation workflow that combines schema-constrained literature extraction, continuous human monitoring, and final expert review by inner-ear genetics researchers. By systematically analysing 250,696 PubMed-indexed records retrieved across 16,563 screened genes, IEKB curates 6,051 gene-phenotype-disease associations from 2,494 genes across 43 phenotype categories and 4,102 cochlear gene-gene interactions with pathway, cell-type, and experimental context. IEKB further includes a Bayesian "dark matter" module that prioritises 243,071 candidate gene-phenotype associations for 13,229 genes across all 43 phenotypes (global AUC-ROC = 0.8603; global AUC-PR = 0.1674), together with a supervised dark-relation layer that ranks phenotype-specific known-gene support for each candidate and a multi-entity scientific network containing nearly 4,000 entities, 28,616 deterministic edges, and 83,712 literature-derived relational links. The web resource supports interactive search, multi-parameter filtering, gene-detail pages, bibliometric exploration, domain-specific enrichment against IEKB phenotype and disease gene sets, network visualisation, bulk download in CSV, JSON, SQLite, and XLSX formats, and natural-language evidence-grounded question answering through a companion conversational interface (IEKB QA). To our knowledge, IEKB is the first openly accessible inner-ear resource that integrates curated associations, cochlear interactions, probabilistic candidate prioritisation, auditable known-gene support relations for novel candidates, and a multi-entity scientific network within a single database. All data are released without registration under the CC BY 4.0 license.

15
Evaluating Individual Level Performance of Polygenic Risk Scores Using Early Onset High Genetic Risk Coronary Artery Disease as a Benchmark

Liang, S.; Kim, M. S.; Sui, Y.; Tan, Y.; Li, L.; Cho, S. M.; Koyama, S.; Liu, Y.; Paruchuri, K.; Chan, A.; Honigberg, M.; Natarajan, P.; Chatterjee, N.; Fahed, A. C.; Yu, Z.

2026-04-18 genetic and genomic medicine 10.64898/2026.04.16.26350801 medRxiv
Top 0.7%
1.2%
Show abstract

Polygenic risk scores (PRSs) are typically validated using population-level metrics, masking variability in individual-level risk prediction and hindering clinical translation. To address this, we introduced a novel framework using a "benchmark" cohort (N=1184) of "unexpected coronary artery disease (CAD)": early-onset patients (<55 years) with a clinical profile of low 10-year risk, no diabetes or severe hypercholesterolemia that excludes therapy indications. The occurrence of early CAD in these clinically low-risk individuals establishes a "ground truth" for high genetic risk. We evaluated 58 published CAD PRSs and demonstrated a disconnection between population-level performance and individual-level accuracy (proportion of benchmark patients captured). The proportion captured by 58 PRSs varied from 10.8% to 33.1%, and the top-performing score was 2-fold more effective at identifying the benchmark group than established non-genetic biomarkers, such as lipoprotein(a). Furthermore, benchmark patients never captured by any score exhibited significantly healthier lipid profiles. Our framework provides an essential method for validating clinical readiness of PRSs.

16
From GWAS to drug: A framework for drug candidate prioritisation using a gene expression signature matching approach

Chauquet, S.; Jiang, J.-C.; Barker, L. F.; Hunter, Z. L.; Singh, G.; Wray, N. R.; McRae, A. F.; Shah, S.

2026-04-24 genetic and genomic medicine 10.64898/2026.04.22.26349470 medRxiv
Top 0.8%
1.2%
Show abstract

Drug targets supported by human genetic evidence have significantly higher approval rates, making genome-wide association studies a valuable resource for drug candidate prioritisation. Transcriptome-wide association study signature-matching is an emerging in silico approach that integrates GWAS data with expression quantitative trait loci to generate a disease gene expression signature, which is then compared against drug perturbation databases such as the Connectivity Map. Despite recent adoption, there is no consensus on optimal methodology. Here, we systematically benchmark key parameters, including TWAS method, eQTL tissue model, similarity metric, gene set size, and CMap cell line, using LDL cholesterol, familial combined hyperlipidemia, and asthma as proof-of-concept traits. We demonstrate that while TWAS signature-matching can successfully prioritise known first-line treatments, performance is highly sensitive to parameter choice; for instance, the selection of the cell line used for drug signatures alone can dramatically alter drug prioritisation. Based on these findings, we propose a best-practice framework for robust, genetically-informed drug prioritisation using TWAS signature-matching.

17
Deciphering sepsis molecular subtypes using large-scale data to identify subtype-specific drug repurposing

Smith, L. A.; Augustin, B.; Jacob, V.; Black, L. P.; Bertrand, A.; Hopson, C.; Cagmat, E.; Datta, S.; Reddy, S.; Guirgis, F.; Graim, K.

2026-03-30 bioinformatics 10.64898/2026.03.28.714506 medRxiv
Top 0.8%
1.1%
Show abstract

Sepsis is a life-threatening dysregulated response to infection, the heterogeneity of which precludes effective targeted therapies. To address this, we created a transcriptomic atlas of publicly available adult sepsis data, on which we performed molecular subtyping and identified potential subtype-specific drug repurposing opportunities. In total, we harmonized data from 3,713 samples across 28 datasets, of which 2,251 were from sepsis patients. Using this data, we identified four molecular subtypes of sepsis (C1 - C4) by clustering the sepsis samples based on expression differences in immune-and lipid-related genes. We next identified gene signatures unique to each molecular subtype. Pathway analysis of these signatures revealed patterns of immune exhaustion and metabolic dysregulation in C1, suggesting potential benefit from corticosteroid treatment. C2 had the youngest patient population and the lowest mortality, and C2 expression patterns were often anti-correlated with those of C1. C3 was enriched for inflammatory and cellular stress pathways, while the highest mortality subtype, C4, showed evidence of immunosuppression and metabolic reprogramming. Gene and pathway-level analysis of our molecular subtypes statistically correlated with results from analysis of 28-day mortality, with the best (C2) and worst subtypes (C4) exhibiting similar molecular dysregulation as survivors and non-survivors, respectively. For each subtype, we then evaluated potential targeted therapies. Using a large-scale pharmacogenomics database, we identified drugs targeting the subtype gene signatures and assessed the potential clinical impacts of these drugs. We identified several potential candidate therapies for each molecular subtype, including possible responsiveness to Methylene Blue therapy for patients from our highest mortality subtype, C4. Notably, our drug repurposing analysis revealed a significant representation of anti-inflammatory monoclonal antibody therapies across molecular subtypes. The anti-correlated signatures in C1 and C2 suggest that monoclonal antibody therapies may not be effective for patients in both subtypes, which may explain why prior clinical trials have been unsuccessful. Altogether, our detailed molecular subtyping and analysis identify potential drug targets within each molecular subtype, with implications for future precision medicine for sepsis.

18
Polygenic risk scores enhance the identification of carriers of monogenic forms of idiopathic pulmonary fibrosis

Alonso-Gonzalez, A.; Jaspez, D.; Lorenzo-Salazar, J. M.; Delgado, A.; Quintero-Bacallado, A.; Ma, S.-F.; Strickland, E.; Mychaleckyj, J.; Kim, J. S.; Huang, Y.; Adegunsoye, A.; Oldham, J. M.; Maher, T. M.; Guillen-Guio, B.; Wain, L. V.; Allen, R. J.; Saini, G.; Jenkins, R. G.; Molina-Molina, M.; Zhang, D.; Kim Garcia, C.; Martinez, F. J.; Noth, I.; Flores, C.

2026-04-18 genetic and genomic medicine 10.64898/2026.04.16.26350967 medRxiv
Top 0.8%
1.1%
Show abstract

Background: Idiopathic pulmonary fibrosis (IPF) is a rare disease with a poor prognosis. Disease risk involves rare and common genetic variants. However, an inverse association have been described between them. Accordingly, IPF patients with a higher polygenic risk score (PRS) for IPF are less likely to carry rare deleterious variants and vice versa. Here, we evaluate weather PRS of IPF could serve as an additional criterion to patient prioritisation for rare variant discovery. Methods: We identified carriers based on the presence of rare qualifying variants (QVs) in genes linked to monogenic forms of pulmonary fibrosis in 888 IPF patients from the Pulmonary Fibrosis Foundation Patient Registry (PFF-PR). Genome-wide association study (GWAS) summary statistics from independent cohorts were used to construct a whole-genome PRS (WG-PRS) using a clumping and thresholding method (C+T) and a Bayesian method (SBayesRC). PRS were also derived from 19 known common sentinel IPF variants (Sentinel-PRS). Logistic regression models were used to evaluate associations between PRS and carrier status. Discriminatory performance was evaluated using area under the curve (AUC) analysis, and comparisons were made with DeLong test. Validation was performed in 472 IPF individuals from the UK PROFILE cohort. Results: IPF-PRS were strongly associated with the QVs carrier status: Odds Ratio [OR] 0.65 (95% Confidence Interval [CI] 0.53-0.79) for WG-PRSC+T, OR 0.71 (95% CI 0.59-0.86) for WG-PRSSBayesRC, and OR 0.77 (95% CI 0.63-0.94) for Sentinel-PRS. Adding WG-PRS to the patient personal clinical history improved the prediction of QVs carriers: AUC=0.62 for the clinical model, AUC=0.68 for WG-PRSC+T (DeLong test, p=9.54x10-4) and AUC=0.66 for WG-PRSSBayesRC (DeLong test, p=0.02). Adding of IPF-PRS to clinical variables correctly reclassified 22.8% of carriers when using WG-PRSC+T, 20.8% when using Sentinel-PRS, and 16.7% for WG-PRSSBayesRC. WG-PRSSBayesRC and the Sentinel-PRS also demonstrated improved prediction of QVs carriers in telomere-related genes in PROFILE. Conclusions: Incorporating IPF-PRS into a model based on the patient clinical history improves the identification of QVs carriers. Although the overall discriminatory power was moderate, these findings raise de the possibility of using WG-PRS as useful criterion for rare variant discovery in patients with IPF and enhance decision-making.

19
Cellector: A tool to detect foreign genotype cells in scRNAseq data with applications in leukemia and microchimerism.

Heaton, H.; Behboudi, R.; Ward, C.; Weerakoon, M.; Kanaan, S.; Reichle, S.; Hunter, N.; Furlan, S.

2026-03-30 bioinformatics 10.64898/2026.03.26.714571 medRxiv
Top 0.8%
1.0%
Show abstract

The existence of rare, genetically distinct cells can occur in various samples such as transplant patients, naturally occurring microchimerism between maternal and fetal tissues, and cancer samples with sufficient mutational burden. Computational methods for detecting these foreign cells are vital to studying these biological conditions. An application that is of particular interest is that of leukemia patients post hematopoietic cell transplant (HCT). In many leukemias, a primary therapy is HCT, after which, the primary genotype of the bone marrow and blood cells should be of donor origin. If cells exist that are of the patients genotype and the cell type lineage of the particular leukemia, this is known as measurable residual disease (MRD). If the MRD is high enough, this may represent a relapse of the patients leukemia. Furthermore, accurately estimating the MRD is important for driving clinical decision making for these patients. Here we present Cellector, a computational method for identifying rare foreign genotype cells in single cell RNAseq (scRNAseq) datasets. We show cellector accurately detects microchimeric cells down to an exceedingly low percentage of these cells present (0.05% or lower).

20
Inherited genetic risk factors in young-onset lung cancer

Esai Selvan, M.; Gould Rothberg, B. E.; Patel, A. A.; Sang, J.; Horowitz, A.; Christiani, D. C.; Klein, R. J.; Gumus, Z. H.

2026-04-15 genetic and genomic medicine 10.64898/2026.04.14.26350822 medRxiv
Top 0.9%
1.0%
Show abstract

Introduction Lung cancer is rare before age 45, and its inherited genetic basis remains poorly defined. Methods We performed whole-genome sequencing in 171 predominantly young-onset lung cancer patients and integrated these data with whole-exome sequencing from six major lung cancer consortia, yielding 9,065 patients. After quality control, analyses focused on 6,545 individuals of European ancestry, the largest ancestral group. We compared the prevalence of rare pathogenic and likely pathogenic (P/LP) germline variants between 186 young-onset (age <45 years) and 6,359 older patients at gene and gene-set levels using Fisher's exact test, stratified by histology, sex, and smoking status. Polygenic risk scores (PRS) derived from common variants were also evaluated. Results Young-onset patients carried a higher burden of rare germline P/LP variants in DNA damage response (DDR) genes (including BRIP1, ERCC6, MSH5), and in cilia-related genes, notably GPR161. At the pathway level, DDR genes were significantly enriched (OR=1.66, p=0.007), with the strongest signal in the Fanconi Anemia pathway and among females (OR=1.96, p=0.01). Enrichment was also observed in inborn errors of immunity pathways, with strongest signals in antibody deficiency and the complement system genes. Young-onset patients additionally exhibited higher lung cancer PRS. Conclusion Young-onset lung cancer exhibits a distinct germline genetic architecture, characterized by enrichment of rare P/LP variants in DDR, cilia-related, and immune pathways, and an elevated lung cancer PRS. These findings support a greater role for inherited susceptibility in early-onset disease and have implications for risk stratification, earlier screening, and precision prevention.